The exponential growth of data in recent years has given the rise to the Data Analytics & Data Science field. Nowadays, data science is applied in any form of industry, in biotech to find the next generation vaccine, fin-tech to find the optimal trading models and payment systems, social networks to understand the interaction between users and so on. The data has outgrown the traditional analysis tools, and new data science techniques had to be developed to let the machine do its own analysis and forecasts.
This unit introduces Data Science using a practical approach, starting from the collection of data, cleansing, storage, visualization and eventually modeling such data to put this model into production. Most techniques will be carried out using the Python language. Statistics such as correlation, hypothesis testing and visualizations will be done both through Python libraries as well in DAX in PowerBI. Some topics covered in this unit include:
- Regression and Prediction
- Classification, Hypothesis Testing and Deep Learning
- Recommendation Systems
- Predictive Modelling for Temporal Data.
Main Reading List
- Machine Learning Mastery with Python. Understand Your Data, Create Accurate Models, and Work Projects End-to-End. Jason Brownlee
- Data Warehousing:
- The DataWarehouse toolkit (Second Edition) Ralph Kimball and Margy Ross
- Visualization:
- The Definitive Guide to DAX. Business Intelligence with Microsoft Excel, SQL Server Analysis 福利在线免费, and Power BI. Marco Russo and Alberto Ferrari. (The best book to learn DAX)
- 福利在线免费 is Beautiful (2012). David McCandless. Contains many infographics and visualization examples. A follow up book by the same author exists which is called Knowledge is Beautiful (2014).
- Statistics:
- Statistical Methods for Machine Learning. Discover how to Transform Data into Knowledge with Python. Jason Brownlee.
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2011). Trevor Hastie, Robert Tibshirani, Jerome Friedman.